Prosper Data by Snah Desai

##  [1] "ListingKey"                         
##  [2] "ListingNumber"                      
##  [3] "ListingCreationDate"                
##  [4] "CreditGrade"                        
##  [5] "Term"                               
##  [6] "LoanStatus"                         
##  [7] "ClosedDate"                         
##  [8] "BorrowerAPR"                        
##  [9] "BorrowerRate"                       
## [10] "LenderYield"                        
## [11] "EstimatedEffectiveYield"            
## [12] "EstimatedLoss"                      
## [13] "EstimatedReturn"                    
## [14] "ProsperRating..numeric."            
## [15] "ProsperRating..Alpha."              
## [16] "ProsperScore"                       
## [17] "ListingCategory..numeric."          
## [18] "BorrowerState"                      
## [19] "Occupation"                         
## [20] "EmploymentStatus"                   
## [21] "EmploymentStatusDuration"           
## [22] "IsBorrowerHomeowner"                
## [23] "CurrentlyInGroup"                   
## [24] "GroupKey"                           
## [25] "DateCreditPulled"                   
## [26] "CreditScoreRangeLower"              
## [27] "CreditScoreRangeUpper"              
## [28] "FirstRecordedCreditLine"            
## [29] "CurrentCreditLines"                 
## [30] "OpenCreditLines"                    
## [31] "TotalCreditLinespast7years"         
## [32] "OpenRevolvingAccounts"              
## [33] "OpenRevolvingMonthlyPayment"        
## [34] "InquiriesLast6Months"               
## [35] "TotalInquiries"                     
## [36] "CurrentDelinquencies"               
## [37] "AmountDelinquent"                   
## [38] "DelinquenciesLast7Years"            
## [39] "PublicRecordsLast10Years"           
## [40] "PublicRecordsLast12Months"          
## [41] "RevolvingCreditBalance"             
## [42] "BankcardUtilization"                
## [43] "AvailableBankcardCredit"            
## [44] "TotalTrades"                        
## [45] "TradesNeverDelinquent..percentage." 
## [46] "TradesOpenedLast6Months"            
## [47] "DebtToIncomeRatio"                  
## [48] "IncomeRange"                        
## [49] "IncomeVerifiable"                   
## [50] "StatedMonthlyIncome"                
## [51] "LoanKey"                            
## [52] "TotalProsperLoans"                  
## [53] "TotalProsperPaymentsBilled"         
## [54] "OnTimeProsperPayments"              
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"    
## [57] "ProsperPrincipalBorrowed"           
## [58] "ProsperPrincipalOutstanding"        
## [59] "ScorexChangeAtTimeOfListing"        
## [60] "LoanCurrentDaysDelinquent"          
## [61] "LoanFirstDefaultedCycleNumber"      
## [62] "LoanMonthsSinceOrigination"         
## [63] "LoanNumber"                         
## [64] "LoanOriginalAmount"                 
## [65] "LoanOriginationDate"                
## [66] "LoanOriginationQuarter"             
## [67] "MemberKey"                          
## [68] "MonthlyLoanPayment"                 
## [69] "LP_CustomerPayments"                
## [70] "LP_CustomerPrincipalPayments"       
## [71] "LP_InterestandFees"                 
## [72] "LP_ServiceFees"                     
## [73] "LP_CollectionFees"                  
## [74] "LP_GrossPrincipalLoss"              
## [75] "LP_NetPrincipalLoss"                
## [76] "LP_NonPrincipalRecoverypayments"    
## [77] "PercentFunded"                      
## [78] "Recommendations"                    
## [79] "InvestmentFromFriendsCount"         
## [80] "InvestmentFromFriendsAmount"        
## [81] "Investors"

Univariate Plots Section

There are 81 variables with 113937 observations.

##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##                     ListingCreationDate  CreditGrade         Term      
##  2013-10-02 17:20:16.550000000:     6          :84984   Min.   :12.00  
##  2013-08-28 20:31:41.107000000:     4   C      : 5649   1st Qu.:36.00  
##  2013-09-08 09:27:44.853000000:     4   D      : 5153   Median :36.00  
##  2013-12-06 05:43:13.830000000:     4   B      : 4389   Mean   :40.83  
##  2013-12-06 11:44:58.283000000:     4   AA     : 3509   3rd Qu.:36.00  
##  2013-08-21 07:25:22.360000000:     3   HR     : 3508   Max.   :60.00  
##  (Other)                      :113912   (Other): 6745                  
##                  LoanStatus                  ClosedDate   
##  Current              :56576                      :58848  
##  Completed            :38074   2014-03-04 00:00:00:  105  
##  Chargedoff           :11992   2014-02-19 00:00:00:  100  
##  Defaulted            : 5018   2014-02-11 00:00:00:   92  
##  Past Due (1-15 days) :  806   2012-10-30 00:00:00:   81  
##  Past Due (31-60 days):  363   2013-02-26 00:00:00:   78  
##  (Other)              : 1108   (Other)            :54633  
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000                  :29084         Min.   : 1.00  
##  1st Qu.:3.000           C      :18345         1st Qu.: 4.00  
##  Median :4.000           B      :15581         Median : 6.00  
##  Mean   :4.072           A      :14551         Mean   : 5.95  
##  3rd Qu.:5.000           D      :14274         3rd Qu.: 8.00  
##  Max.   :7.000           E      : 9795         Max.   :11.00  
##  NA's   :29084           (Other):12307         NA's   :29084  
##  ListingCategory..numeric. BorrowerState  
##  Min.   : 0.000            CA     :14717  
##  1st Qu.: 1.000            TX     : 6842  
##  Median : 1.000            NY     : 6729  
##  Mean   : 2.774            FL     : 6720  
##  3rd Qu.: 3.000            IL     : 5921  
##  Max.   :20.000                   : 5515  
##                            (Other):67493  
##                     Occupation         EmploymentStatus
##  Other                   :28617   Employed     :67322  
##  Professional            :13628   Full-time    :26355  
##  Computer Programmer     : 4478   Self-employed: 6134  
##  Executive               : 4311   Not available: 5347  
##  Teacher                 : 3759   Other        : 3806  
##  Administrative Assistant: 3688                : 2255  
##  (Other)                 :55456   (Other)      : 2718  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           False:56459         False:101218    
##  1st Qu.: 26.00           True :57478         True : 12719    
##  Median : 67.00                                               
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey                 DateCreditPulled 
##                         :100596   2013-12-23 09:38:12:     6  
##  783C3371218786870A73D20:  1140   2013-11-21 09:09:41:     4  
##  3D4D3366260257624AB272D:   916   2013-12-06 05:43:16:     4  
##  6A3B336601725506917317E:   698   2014-01-14 20:17:49:     4  
##  FEF83377364176536637E50:   611   2014-02-09 12:14:41:     4  
##  C9643379247860156A00EC0:   342   2013-09-27 22:04:54:     3  
##  (Other)                :  9634   (Other)            :113912  
##  CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:660.0         1st Qu.:679.0        
##  Median :680.0         Median :699.0        
##  Mean   :685.6         Mean   :704.6        
##  3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :880.0         Max.   :899.0        
##  NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##                     :   697     Min.   : 0.00      Min.   : 0.00  
##  1993-12-01 00:00:00:   185     1st Qu.: 7.00      1st Qu.: 6.00  
##  1994-11-01 00:00:00:   178     Median :10.00      Median : 9.00  
##  1995-11-01 00:00:00:   168     Mean   :10.32      Mean   : 9.26  
##  1990-04-01 00:00:00:   161     3rd Qu.:13.00      3rd Qu.:12.00  
##  1995-03-01 00:00:00:   159     Max.   :59.00      Max.   :54.00  
##  (Other)            :112389     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2014-01-22 00:00:00:   491   Q4 2013:14450         
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   Q1 2014:12172         
##  Median : 6500      2014-02-19 00:00:00:   439   Q3 2013: 9180         
##  Mean   : 8337      2013-10-16 00:00:00:   434   Q2 2013: 7099         
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   Q3 2012: 5632         
##  Max.   :35000      2013-09-24 00:00:00:   316   Q2 2012: 5061         
##                     (Other)            :111428   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 
## [1] 113937     81
## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...
# Univariate Analysis
### What is the structure of your dataset? Overall there are a total of 113937 observations on teh file and 86 variables. Out of that I observed a few factor variables such as Monthy Loan Payment, Debt to Income Ratio, Employment Status Duration, Prosper Score, Term, Borrower Rate, Borrower APR, Estimated Return, Stated Monthly Income and Lender Yeild ### What is/are the main feature(s) of interest in your dataset? I am looking at varibles that will affect the credit score of an individual looking to borrow. This will also dictate what rate and at what term the user will be able to borrow at through the service. ### What other features in the dataset do you think will help support your investigation into your feature(s) of interest? Monthly Income and Loan Payment, Interest Rates, Prosper Score and Term ### Did you create any new variables from existing variables in the dataset? Yes? ### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this? many graphs had their peaks to the right of the
# Univariate Section
The result shows the number of “NA” values in each column of the data
r colSums(is.na(pf))
## ListingKey ListingNumber ## 0 0 ## ListingCreationDate CreditGrade ## 0 0 ## Term LoanStatus ## 0 0 ## ClosedDate BorrowerAPR ## 0 25 ## BorrowerRate LenderYield ## 0 0 ## EstimatedEffectiveYield EstimatedLoss ## 29084 29084 ## EstimatedReturn ProsperRating..numeric. ## 29084 29084 ## ProsperRating..Alpha. ProsperScore ## 0 29084 ## ListingCategory..numeric. BorrowerState ## 0 0 ## Occupation EmploymentStatus ## 0 0 ## EmploymentStatusDuration IsBorrowerHomeowner ## 7625 0 ## CurrentlyInGroup GroupKey ## 0 0 ## DateCreditPulled CreditScoreRangeLower ## 0 591 ## CreditScoreRangeUpper FirstRecordedCreditLine ## 591 0 ## CurrentCreditLines OpenCreditLines ## 7604 7604 ## TotalCreditLinespast7years OpenRevolvingAccounts ## 697 0 ## OpenRevolvingMonthlyPayment InquiriesLast6Months ## 0 697 ## TotalInquiries CurrentDelinquencies ## 1159 697 ## AmountDelinquent DelinquenciesLast7Years ## 7622 990 ## PublicRecordsLast10Years PublicRecordsLast12Months ## 697 7604 ## RevolvingCreditBalance BankcardUtilization ## 7604 7604 ## AvailableBankcardCredit TotalTrades ## 7544 7544 ## TradesNeverDelinquent..percentage. TradesOpenedLast6Months ## 7544 7544 ## DebtToIncomeRatio IncomeRange ## 8554 0 ## IncomeVerifiable StatedMonthlyIncome ## 0 0 ## LoanKey TotalProsperLoans ## 0 91852 ## TotalProsperPaymentsBilled OnTimeProsperPayments ## 91852 91852 ## ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate ## 91852 91852 ## ProsperPrincipalBorrowed ProsperPrincipalOutstanding ## 91852 91852 ## ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent ## 95009 0 ## LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination ## 96985 0 ## LoanNumber LoanOriginalAmount ## 0 0 ## LoanOriginationDate LoanOriginationQuarter ## 0 0 ## MemberKey MonthlyLoanPayment ## 0 0 ## LP_CustomerPayments LP_CustomerPrincipalPayments ## 0 0 ## LP_InterestandFees LP_ServiceFees ## 0 0 ## LP_CollectionFees LP_GrossPrincipalLoss ## 0 0 ## LP_NetPrincipalLoss LP_NonPrincipalRecoverypayments ## 0 0 ## PercentFunded Recommendations ## 0 0 ## InvestmentFromFriendsCount InvestmentFromFriendsAmount ## 0 0 ## Investors ## 0
In the code bellow we find the most popular loan term legts whic in order are 36, 60 and 12 months.
r pf$Term <- factor(pf$Term) table(pf$Term)
## ## 12 36 60 ## 1614 87778 24545
r ggplot(aes(x=Term), data=pf) + geom_bar()
Bellow is the histogram of Borrower Rates. The high peak is around .36 and the historgarm is right skewed.
r qplot(BorrowerRate, data = pf, geom = "histogram", binwidth = .005)
The most popular Borrower Rate is .3177
r pf %>% group_by(BorrowerRate) %>% summarise(Count=n()) %>% arrange(desc(Count))
## # A tibble: 2,294 × 2 ## BorrowerRate Count ## <dbl> <int> ## 1 0.3177 3672 ## 2 0.3500 1905 ## 3 0.3199 1651 ## 4 0.2900 1508 ## 5 0.2699 1319 ## 6 0.1500 1182 ## 7 0.1400 1035 ## 8 0.1099 949 ## 9 0.2000 907 ## 10 0.1585 806 ## # ... with 2,284 more rows
Bellow is the histogram for Borrower APR and we can see a peak at around .35
r qplot(BorrowerAPR, data = pf, geom = "histogram", binwidth = .001)
## Warning: Removed 25 rows containing non-finite values (stat_bin).
As it was visually shown in the histogram above and now a summary count bellow, the most popular APR is .35797
r pf %>% group_by(BorrowerAPR) %>% summarise(Count=n()) %>% arrange(desc(Count))
## # A tibble: 6,678 × 2 ## BorrowerAPR Count ## <dbl> <int> ## 1 0.35797 3672 ## 2 0.35643 1644 ## 3 0.37453 1260 ## 4 0.30532 902 ## 5 0.29510 747 ## 6 0.35356 721 ## 7 0.29776 707 ## 8 0.15833 652 ## 9 0.24246 605 ## 10 0.24758 601 ## # ... with 6,668 more rows
Histogram of Lender Yield
r qplot(LenderYield, data = pf, geom = "histogram", binwidth = .001)
Summary of yield (profit) of person lending/return on lending
r pf %>% group_by(LenderYield) %>% summarise(Count=n()) %>% arrange(desc(Count))
## # A tibble: 2,283 × 2 ## LenderYield Count ## <dbl> <int> ## 1 0.3077 3672 ## 2 0.3400 1916 ## 3 0.3099 1651 ## 4 0.2599 1318 ## 5 0.1450 1011 ## 6 0.1300 999 ## 7 0.0999 955 ## 8 0.1400 877 ## 9 0.1485 801 ## 10 0.1199 779 ## # ... with 2,273 more rows
Average Lender Yield is .1827
r summary(pf$LenderYield)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -0.0100 0.1242 0.1730 0.1827 0.2400 0.4925
Bellow is a histogram for Estimated Returns. As we can see the graph is skewed with outliers less than 0 and past .2
r qplot(EstimatedReturn, data = pf, geom = "histogram", binwidth = .001)
## Warning: Removed 29084 rows containing non-finite values (stat_bin).
The most common return is .14870
r pf %>% group_by(EstimatedReturn) %>% summarise(Count=n()) %>% arrange(desc(Count))
## # A tibble: 1,477 × 2 ## EstimatedReturn Count ## <dbl> <int> ## 1 NA 29084 ## 2 0.12460 2217 ## 3 0.14870 1097 ## 4 0.06910 760 ## 5 0.14140 738 ## 6 0.10740 654 ## 7 0.11500 611 ## 8 0.07713 565 ## 9 0.06706 556 ## 10 0.09050 538 ## # ... with 1,467 more rows
The average estimated return .096
r summary(pf$EstimatedReturn)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## -0.183 0.074 0.092 0.096 0.117 0.284 29084
The graph bellow is right skewed without a peak at the right.
r ggplot(aes(x = EmploymentStatusDuration), data=pf) + geom_histogram(bins=100)
## Warning: Removed 7625 rows containing non-finite values (stat_bin).
Bellow is a histogram of loan ammounts issued showing a right skewed graph.
r ggplot(aes(x = LoanOriginalAmount), data = pf) + geom_histogram(bins=10)
Prosper scores range from 1 to 11. 1 being the highest risk and 11 being lowest. The bar graph shows the distribution of scores.
r ggplot(aes(x = ProsperScore), data = pf) + geom_bar()
## Warning: Removed 29084 rows containing non-finite values (stat_count).
The table shows how many people hold each score
r table(pf$ProsperScore)
## ## 1 2 3 4 5 6 7 8 9 10 11 ## 992 5766 7642 12595 9813 12278 10597 12053 6911 4750 1456
Bellow is the histogram of length in months for Employment Status Duration. We can see that after the sqrt is applied there is a clear right skew
r ggplot(aes(x = EmploymentStatusDuration), data = pf) + geom_histogram(bins=100)
## Warning: Removed 7625 rows containing non-finite values (stat_bin).
r ggplot(aes(x = EmploymentStatusDuration), data = pf) + geom_histogram(bins=100) + scale_x_sqrt()
## Warning: Removed 7625 rows containing non-finite values (stat_bin).
Bellow is a summary of the length in months of the employment status at the time the listing was created.
r summary(pf$EmploymentStatusDuration)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## 0.00 26.00 67.00 96.07 137.00 755.00 7625
Bellow is a summary of the Debt to Income Ratios for borrowers.
r summary(pf$DebtToIncomeRatio)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## 0.000 0.140 0.220 0.276 0.320 10.010 8554
Bellow is the debt to income ratio of the borrower at the time the credit profile was pulled. This value is Null if the debt to income ratio is not available. This value is capped at 10.01 (any debt to income ratio larger than 1000% will be returned as 1001%).
r ggplot(aes(x = DebtToIncomeRatio), data=subset(pf,!is.na(DebtToIncomeRatio))) + geom_histogram(bins=100)
Table view of the Debt to Income Ratio’s
r pf %>% group_by(DebtToIncomeRatio) %>% summarise(Count=n()) %>% arrange(desc(Count))
## # A tibble: 1,208 × 2 ## DebtToIncomeRatio Count ## <dbl> <int> ## 1 NA 8554 ## 2 0.18 4132 ## 3 0.22 3687 ## 4 0.17 3616 ## 5 0.14 3553 ## 6 0.20 3481 ## 7 0.16 3442 ## 8 0.19 3392 ## 9 0.15 3338 ## 10 0.21 3226 ## # ... with 1,198 more rows
Sumary of the Monthly Loan Payments by borrowers
r summary(pf$MonthlyLoanPayment)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.0 131.6 217.7 272.5 371.6 2252.0
Histogram of Monthly Loan Payments
r ggplot(aes(x = MonthlyLoanPayment), data = pf) + geom_histogram(bins=100)
Instead of using Credit Score Range Lower and Credit Score Range Upper I combined the two with average and created the Credit Score Range Mid. This is what I used to plot the histogram of Credit Scores.
```r pf\(CSRangeMid <- (pf\)CreditScoreRangeLower + pf$CreditScoreRangeUpper) /2
ggplot(aes(x = pf$CSRangeMid), data = pf) + geom_histogram(bins=100) ```
## Warning: Removed 591 rows containing non-finite values (stat_bin).
r summary(pf$CreditScoreRangeLower)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## 0.0 660.0 680.0 685.6 720.0 880.0 591
The histogram of StatedMonthlyIncome is a bit right skewed mainly due to some outliers as outlined in the summary.
After applying log10 to the y-axis the graph better shows the right skew.
Summary of StatedMonthlyIncome shows a big difference between 3rd quadrant and the maximum value thus showing there are large outliers in the data.
r qplot(data = pf, x = StatedMonthlyIncome) + xlim(c(0, 50000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 83 rows containing non-finite values (stat_bin).
r ggplot(aes(x = StatedMonthlyIncome), data=subset(pf, StatedMonthlyIncome < 50000)) + geom_histogram(bins = 100) + scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 3 rows containing missing values (geom_bar).
r summary(pf$StatedMonthlyIncome)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3200 4667 5608 6825 1750000
The table shows that there are 1140 people with monthly income greater than 20526.67. Most of the people’s income is around at 4667.
r pf %>% group_by(StatedMonthlyIncome) %>% summarise(Count=n()) %>% arrange(desc(Count))
## # A tibble: 13,502 × 2 ## StatedMonthlyIncome Count ## <dbl> <int> ## 1 4166.667 3526 ## 2 5000.000 3389 ## 3 3333.333 2917 ## 4 3750.000 2428 ## 5 5416.667 2374 ## 6 5833.333 2319 ## 7 6250.000 2276 ## 8 2500.000 2256 ## 9 4583.333 2211 ## 10 6666.667 2162 ## # ... with 13,492 more rows

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

What would seem like an obvious observation would be the relationship between Prosper Score and Borrower Rate. With the bellow Jitter plot you can see that as Prosper Scores increase the Borrower Rate decreases. This same concept applies to Credit Scores, we can see that as an individuals credit score inceases there is a general trend that their Borrower Rate will decease.

A key indicaiton I came across was that as prosper scores increase an individuals Credit Score also increases and that means they will not only be able to borrow at a better rate but also have a larger potential loan ammount.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

As Borrower Rate increased so did the Estimated Return. People with greater income have better credit scores thus them being able to increase their loan payment and overall amount.

What was the strongest relationship you found?

Monthly Loan Payment and Loan Amount and Borrower Rate with Prosper Score.

Bivariate Plots Section

From the jitter plot we ecan see that the lower the Prosper Score the higher the Borrower Rate is.

## Warning: Removed 29084 rows containing missing values (geom_point).

ggplot(aes(factor(ProsperScore), BorrowerRate),  data = pf) +
  geom_jitter( alpha = .05)  +
  geom_boxplot( alpha = .5,color = 'blue')+
  stat_summary(fun.y = "mean", 
               geom = "point", 
               color = "red", 
               shape = 8, 
               size = 4)+
  geom_smooth(aes(ProsperScore, 
                    BorrowerRate),
                method = "lm", 
                se = FALSE,size=2)
## Warning: Removed 29084 rows containing non-finite values (stat_smooth).

Bellow is a scatter plot maping Borrower Rate against Estimated Return. We can see that BorrowerRate and Estimated return are positively correlated.

ggplot(aes(y = BorrowerRate,x = EstimatedReturn), data = pf) +
    geom_jitter(alpha=0.01,size=2) 
## Warning: Removed 29084 rows containing missing values (geom_point).

From the jitter plot bellow we can see that Credit Score and BorrowerRate are negatively correlated.

ggplot(aes(y = BorrowerRate,x = CSRangeMid), 
         data = subset(pf,CSRangeMid>300)) +
    geom_jitter(alpha=0.01,size=2)

Individuals with the best Credit Scores have a Posper score of 10 and there is a drop off before and after.

ggplot(aes(y = CSRangeMid,x = ProsperScore), data = pf) +
    geom_jitter(alpha=0.01,size=2)
## Warning: Removed 29084 rows containing missing values (geom_point).

summary(pf$CreditScoreRangeUpper)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    19.0   679.0   699.0   704.6   739.0   899.0     591

We can observe a positive relationship between Loan Amount and Monthly Payment. As the Loan Amount increases, so does the Monthly Payments.

ggplot(aes(x = LoanOriginalAmount, 
           y = MonthlyLoanPayment  , color = factor(ProsperScore)), 
       data = pf) +
      geom_point(alpha = 0.8, size = 2) +
      geom_smooth(method = "lm", se = FALSE,size=1)  +
  scale_color_brewer(type='seq',
                   guide=guide_legend(title='ProsperScore'))
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Blues is 9
## Returning the palette you asked for with that many colors
## Warning: Removed 35290 rows containing missing values (geom_point).

Individuals that Completed their loans people have greater incomes than Defaulted people.

by(pf$StatedMonthlyIncome, pf$LoanStatus, summary)
## pf$LoanStatus: Cancelled
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    2445    2600    2609    3833    4167 
## -------------------------------------------------------- 
## pf$LoanStatus: Chargedoff
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    2500    3750    4486    5500  208300 
## -------------------------------------------------------- 
## pf$LoanStatus: Completed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    2917    4417    5325    6583  618500 
## -------------------------------------------------------- 
## pf$LoanStatus: Current
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3667    5167    6153    7447 1750000 
## -------------------------------------------------------- 
## pf$LoanStatus: Defaulted
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    2500    3708    4367    5417   58620 
## -------------------------------------------------------- 
## pf$LoanStatus: FinalPaymentInProgress
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1167    3583    5250    6312    8333   32920 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (>120 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3115    3750    3727    4500    6667 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (1-15 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3167    4667    5554    6948   35420 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (16-30 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3250    4583    5484    6500   30000 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (31-60 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    2938    4583    5436    7083   25000 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (61-90 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3167    4583    5323    6594   31250 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (91-120 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3073    4171    4816    5833   22920

People with lower Borrower Rates complete their loan payments better than those with higher rates.

by(pf$BorrowerRate, pf$LoanStatus, summary)
## pf$LoanStatus: Cancelled
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1075  0.1395  0.2000  0.1844  0.2375  0.2375 
## -------------------------------------------------------- 
## pf$LoanStatus: Chargedoff
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0100  0.1769  0.2400  0.2354  0.2975  0.4500 
## -------------------------------------------------------- 
## pf$LoanStatus: Completed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1173  0.1744  0.1864  0.2511  0.4975 
## -------------------------------------------------------- 
## pf$LoanStatus: Current
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0577  0.1314  0.1760  0.1838  0.2310  0.3304 
## -------------------------------------------------------- 
## pf$LoanStatus: Defaulted
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1650  0.2296  0.2231  0.2875  0.4975 
## -------------------------------------------------------- 
## pf$LoanStatus: FinalPaymentInProgress
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0629  0.1299  0.1899  0.1970  0.2712  0.3199 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (>120 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1449  0.2079  0.2551  0.2527  0.3060  0.3199 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (1-15 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0749  0.1870  0.2317  0.2308  0.2859  0.3435 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (16-30 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0599  0.1899  0.2419  0.2353  0.2909  0.3304 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (31-60 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0649  0.1855  0.2468  0.2330  0.2870  0.3304 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (61-90 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0659  0.1914  0.2468  0.2400  0.2999  0.3304 
## -------------------------------------------------------- 
## pf$LoanStatus: Past Due (91-120 days)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0766  0.1850  0.2495  0.2383  0.2952  0.3435

From the jitter pot bellow we can see that generally people with good credit score have greater loan amounts.

ggplot(aes(x = CSRangeMid,y = LoanOriginalAmount), 
       data = subset(pf,CSRangeMid > 350)) +
  geom_jitter(alpha = .05, size = 2)

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

First, People with higher Prosper Scores tend to pay back their loans quicker. I Things such as the individuals monthly income, credit score, loan amount and periodic loan payment are contributing factors Prosper Score but there is no strong relation between Prosper Score Borrower Rate. ### Were there any interesting or surprising interactions between features? Interestingly enough, most people fulfilled their loan payments. Another interesting thing I found was that people paid their loans with 0 reported income. ### OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

Multivariate Plots Section

Bellow is the jitter plot between Loan Original Amount and Monthly Loan Payment colored by Prosper Score

An interesting finding to note is that most of the people who are currently paying their loans are self-employed.

ggplot(aes(y = BorrowerRate, x = ProsperScore, color = EmploymentStatus),
       data=subset(pf,EmploymentStatus != "" & 
                     EmploymentStatus != "Employed" &
                     EmploymentStatus != "Other" &
                     !is.na(ProsperScore))) +
         geom_jitter(alpha=1, size=2) +
  scale_color_brewer(type='qual')

Similiar to the graph above the one bellow demostrates the relationship between Employment Status, Borrower Rate and the Prosper score but in the form of a Box Plot.

ggplot(aes(y = BorrowerRate, x = factor(ProsperScore), fill = EmploymentStatus),
       data=subset(pf,EmploymentStatus != "" & 
                     EmploymentStatus != "Employed" &
                     EmploymentStatus != "Other" &
                     !is.na(ProsperScore))) +
         geom_boxplot( ) +
  scale_color_brewer(type='qual')

Bellow is a jitter plot with Montly Loan Payment and Credit Score Range Mid colored by Prosper Score.

ggplot(aes(x = MonthlyLoanPayment, y = CSRangeMid, 
           color = ProsperScore), data = pf) +
  geom_jitter(alpha=0.5, size=2) 
## Warning: Removed 591 rows containing missing values (geom_point).


Final Plots and Summary

Plot One

Description One

This graph shows ProsperScore and BorrowerRate. The graph shows that Borrower Rate decrease as the ProsperScore increases. This graph is to show a clear relationship that prosper score does affect ones ability to borrow and more then that at what rate they can borrow. As this is a major factor in someones loan consideration, its key to establish a realtionship and then look further into what goes into this.

Plot Two

Description Two

Above is a histogram of ProsperScores and it has a normal distribution. As we can see most scores are between 4 and 8. This histogram shows the distrubtion of scores and will lead to an indicator for further analysis and validation of what affects the prosper and credit scores.

Plot Three

Description Three

The graph shows that people with better Prosper Scores and Credit Scores have larger stated montly incomes and the opposite applies. As Credit Score inceases so does the Stated Monthly Income. This was done to understand that while an individuals Credit Scores goes into defining their Prosper Score, does this also mean that there is a change in their incomes. As it relates to my iniditual question I want to understand the factors that ultimetly go into an individuals Credit Score and Propser Score and one of the biggest factors that is many times assumed is Income.


Reflection

With a total of 113937 data points and 81 variables I first removed “NA” Variables as they may get in the way of my analysis. To rarget some points I was interested in, specifically how Prosper Score is found and affects to Credit Score I looked at variables such as MonthlIncome, CreditScore, BorrowerRate, BorrowerAPR and ProsperScore. There was also outliers in the MonthlyIncome variable.

Some interesting things I found during analysis was that people who have good ProsperScores tends to have a lower BorrowerRate. This same concept applies to Credit Scores, we can see that as an individuals credit score inceases there is a general trend that their Borrower Rate will decease. I aso observed that as someones prosper scores increase an individuals Credit Score also increases and that means they will not only be able to borrow at a better rate but also have a larger potential loan ammount.

All said and done the varibles that affect ProsperScores are the same as which they affect the CreditScore and from there those same variables affect an individuals BorrowRate. Especially as an individuals Income and pre exisitng credit scores do hav an affect on the end borrower rate, Prosper Score and other things such as loan ammount and term.

There are a great deal of interesting variables to be explored, cleaned and interipuriated. That being said, there was a lot that I was unable to explore such as what areas had the most amount of loans issues as well as their amounts. I would also be interested in seeing what loan amount had the highest percentage of completing the loan that would also be combiend with looking into what type of loan has the highest percentage of completion.